Workflows for genome research are extremly complex - Docker and parallel processing come to the rescue
16 Oct 2015 Frankfurt - Maria Chatzou introduced Nextflow during the Docker workshop at the ISC Cloud & Big Data Conference in Frankfurt. Nextflow is a system to manage scientific workflows for genome analysis. Scientific workflows are complex and very big. Chatzou cites Nature: "Storing and processing genome data will exceed the computing challenges of running YouTube and Twitter, biologists warn". The workflows consist of multiple third party software components with many dependencies (libraries, scripts, tools, etc.). They are research tools that frequently change, so there is a need for flexibility. However, Chatzou notes that maintaining a scientific production chain, with controlled versions, within a standard production environment is impossible in practice. That is why NextFlow was developed.
Nextflow has several useful features, including automatic Parallelization; Resumption of Pipelines; and Polyglot. It takes care of virtualization & packaging, is portable and is platform agnostic. Nextflow also supports sharing and collaboration. In the distributed model of Nextflow, Docker is used to package tools and apps.
In the video recording of the presentation, Maria Chatzou demonstrates the workings of Nextflow live.