Website optimization

If you’re like me you hate waiting for websites to load and sometimes just hit the „back“ button if a site is just too slow. If that was your website you miss out on visitors and maybe even customers! What’s worse for you in this case: Your site will also be less visible in search engines like Google as there is a scoring part based on page speed!  During a webinar by Harald Köppe and SEO specialist freelancer Beatrice Köhler I learned the following tips to improve the likelihood of your page e.g. for your freelance consulting business being found in the internet.

The main tool used is PageSpeed Insights from Google itself which measures the loading speed and gives great ideas on how to improve specific parts of your web page. Using my own bioinformatics freelancer business page as an example (WordPress), the speed score went from a bad 25 to a good 90 after a few minutes of optimization:

Score before optimization
Before optimization

Score after optimization
After optimization

The main steps were:
1. Compress or convert images using the online software Squoosch.
2. Deactivate & remove unused WordPress modules, keep active ones up to date.
3. Use WordPress Plugin Autoptimize to e.g. automatically optimize CSS and JavaScript parts and to defer loading of images.

highlight the image compression options
Image optimization with Squoosh

There are many more improvement options of course, but this was a very quick and impressive way to get started!

CRAM format notes

CRAM files are compressed versions of BAM files containing (aligned) sequencing reads. They represent a further file size reduction for this type of data that is generated at ever increasing quantities. Where SAM files are human-readable text files optimized for short read storage, BAM files are their binary equivalent, and CRAM files are a restructured column-oriented binary container format for even more efficient storage.

Tke key components of the approach are that positions are encoded in a relative way (i.e., the difference between successive positions is stored rather than the absolute value) and stored as a Golomb code. Also, only differences to the reference genome are listed instead of the full sequence.

The compression rates achieved are shown in the graph below generated by Uppsala University:

Comparing speed: Using the C implementation of for CRAM (James K. Bonfield), decoding is 1.5–1.7× slower than generating BAM files, but 1.8–2.6× faster at encoding. (File size savings are reported at 34–55%.)

Additional compression can be achieved by reducing the granularity of the quality values which will result in lossy compression though. Illumina suggested a binning of Q scores without significant calling performance. 

Binning of similar Q-scores (Illumina):

Compression achieved by Q-score binning (Illumina):

Sources and further reading:

  1. Format definition and usage
  2. cram-toolkit
  3. Detailed report at the Uppsala University
  4. SAMtools with CRAM support
  5. Original article from Markus Hsi-Yang Fritz, Rasko Leinonen, Guy Cochrane and Ewan Birney
  6. Article about the implementation in C
  7. Illumina while paper on Qscore compression

BlueFuse Multi Errors

When processing microarray or sequencing data with BlueGnome’s / Illumina’s BlueFuse Multi software information and errors are automatically recorded in a log file. By default this should be found in

C:\ProgramData\BlueGnome\BlueFuse Multi\blueMarker.log

Specifically for the VeriSeq PGS application the following error code might be listed:

        FailNone = 1,
        FailInvalidDB = 2,
        FailInvalidModules = 3,
        FailPlatform = 4, 
        FailBAMWorkflow = 5, 
        FailGenomeBuild = 6, 
        FailSampleID = 7, 
        FailFlowcellID = 8, 
        FailSampleSheetWorkflow = 9, 
        FailSampleSheetWorkflowVersion = 10, 
        FailBAMCheckSum = 11, 
        FailSampleSheetCheckSum = 12, 
        FailBAMWorkflowVersion = 13,
        FailSampleSheetFileVersion = 14, 
        FailSampleSheetBarcode = 15, 
        FailBAMFile = 16

Life as a Bioinformatics Freelancer: The tools

This post is part of a series of short articles about bioinformatics freelancing.

In this part of the story I’ll share what technology I found useful for doing my work for different projects as a consulting bioinformatics scientist. This is the current state as of the end of 2018. It might change, but it might be useful for people in similar situations.

Computer set-up:

I do most of the work remotely, i.e. from my office at home with some visits to clients where possible. There I’m using:

  • An Apple MacBook Pro running the latest OSX.
  • on a Griffin Elevator stand – When you’re sitting many hours you need to keep a good posture!
  • A number of external USB hard disks like this one with 2 TB – Don’t fill up your machine and make sure you do backups!
  • Either a TrackMan Marble (to avoid bending your wrist) or a Logitech M330 Silent Plus mouse – I don’t know why not everybody is using the silent mouse! The constant clicking and scrolling get too annoying!
  • An Apple wired aluminum keyboard (with numeric keypad)
  • Connected to an HP 27es  HDMI monitor
  • The internet comes through a TP-Link power line connection
  • with a Thunderbolt Gigabit Ethernet Adapter 

Software set-up:

Part A: Programming, etc.

  • The key is the Unix-based OS, I wouldn’t want to work without access to the powerful command-line tools, etc.
  • For small or visual Python projects the iPython Jupyter Notebooks are great
  • For larger Python (or other) projects I like PyCharm CE
  • To do any more data- or processing-intensive tasks I use machines of suitable size in the Amazon cloud.
    I keep an image (AMI) there which has the software installed that I usually need, so starting work there is quick and much cheaper than buying your own server. These cloud machines are also better secured than most on-site servers!
  • I also share data and results with my clients through S3 on the Amazon cloud. Alternatively I set up a Nextcloud storage on my web-hosting server.
  • For most code-reading and -writing as well as for note-taking I love the TextMate editor.

Part B: Project management, marketing, etc.

  • I maintain a WordPress-based website hosted at all-inkl, with some companion pages (1, 2) to drive traffic.
  • The profiles at LinkedIn and XING are of key importance  in order to be found when people search for your bioinformatics service.
  • Tracking the time I spend on different projects is done with a slightly customized version of Anuko that I installed on my server.
  • Expenses and other money-related tracking for net income determination (Einnahmenüberschussrechnung) is done with MS Excel or LibreOffice first. It is then entered in the (cloud-based) tax software LexOffice for the regular VAT submissions (Umsatzsteuer-Zahlungen an das Finanzamt via ELSTER). This software is not perfect, but you can do a 30-day test (or a 1-year test if there are promotions) to try if it is for you.

As you can see many of the tools are open-source or at least free software solutions.

Life as a Bioinformatics Freelancer: Finding work

This post is part of a series of short articles about bioinformatics freelancing.

So you have some free time and energy to use and extend your bioinformatics skills? One main worry of a potential (or existing) freelancer is:
  How do I find work!
  How will companies know that I am here and I am available for hire?

In my opinion, there is enough work for all of us. The field of bioinformatics can be defined quite broadly and it is still expanding. The real issue seems to be that some companies are not used to working with external freelancers / consultants and are therefor not aware of this resource! For us to spread the word and find interesting projects my suggestions would be:

A – Your network: If you have worked in the field before or know people from uni, make use of these contacts! This is by far the best way to get started as you will be in a field you already know and potentially with people who like and value you already.
Let them know that you would be happy to work with them and what areas you are most skilled or interested in. Make sure they don’t feel pressured though.

B – Internet resources: Besides the direct advertising through your own web site(s), make sure you make full use of online social networks like LinkedIn and Xing. They have channeled many interested parties to me (along with random requests of course)!

In addition a whole number of freelancer platforms have been created in recent years that focus on bringing companies and freelancer together. You can also leave your CV with recruitment agencies like Hayes. Just remember that any of these platforms or agencies need to pay their employees as well – so they will take a (sometimes significant) cut of your earnings.
Make sure you add keywords to all sites and profiles that describe the kind of knowledge you have or the kind of work you are looking for!

Here is a list of potential sources of project work for you as a bioinformatics freelancer:

Ad campaigns are expensive and have not had the effect desired for me.