To do the research you need large cohorts of patients but also matched controls which is only even harder to collect. Jan Veldink's team really needs all of them whole genome sequenced. This amounts to about ninety gigabytes of data per sample. If you want to collect tens of thousands of subjects you need large storage and compute power. The samples are being sent to a company in the US, Illumina. There, they are being whole genome sequenced, 600 per month, which is a high turn-around time. Then, the data comes back over the wire through SURFsara. The data is stored at SURFsara which also allows to provide a local back-up to all the people who collaborate in the project. The SURFsara copy is a working copy of the data. It allows to perform all the calculation at SURFsara on the combined dataset. The investigators (PIs) in the consortium decide whether or not to share the data. They are owner of their own part of the data. If they allow it, you can open up their part of the data so the researchers can do a huge joint analysis.
As of this moment, there are six institutes involved. The partners really want to round this up to maybe ten or fifteen institutes in the coming months. That is really necessary because ALS is a not so common disease and the partners really need international collaborators to get to these numbers of 15.000 patients at least if you want to have whole genome sequences.
Currently, the data is split into two parts: the raw data and the annotated data. The researchers at this moment mainly do the calculations on the annotated data. You could also do it locally. The PIs in the project have local copies of this annotated data. Most analyses now do occur locally on HPC solutions in the US and also here in The Netherlands, at UMC in Utrecht. However, if researchers want to use the raw data where they have some other challenges that they can solve for this disease, they really need SURFsara. Those analyses still need to be run and conducted. That is kind of the set up. The analyses of the raw data are impossible to do locally within the consortium.
For the annotated data, the so-called VCF files, the researchers use the Lisa cluster mainly. The researchers look for this burden of mutations in genes. You kind of summarize all mutations, all variations in a gene and compare those to the healthy controls. Something comes out and the researchers try to replicate it. This is done on Lisa. On grid, the researchers use the roll read information and they can do all sorts of interesting things, for example looking for strange duplications or repeated elements in the genome that are not present in the VCF files. This requires going through all the read information and this is powers of magnitude larger in terms of compute power than the VCF files.
In the end you can calculate a mean genetic number for every subject. You can plot them into 2D space and see whether or not it is a homogeneous sample. This is quite standard in genetics. Other than that, it is not really necessary to have some advanced visualization tools.
Jan Veldink prefers precision medicine to personalized medicine. Precision medicine is all about pinpointing the exact cause of an individual's disease. We know that ALS is a collection of diseases - it is not one disease - so every small group of patients has their own mutation or variants that are responsible for the disease. Nowadays, in molecular biology it is possible to correct or to silence those or to really target those precise aberrations in one patient or in one subgroup of patients. This is going to be of way more benefit than just shooting with a generic drug that will maybe slow the disease a bit. The researchers really hope that this approach will either arrest the disease in the longer term or even better, that remains to be seen.
There are many challenges still ahead: how specific is it really? How efficient is it? Is it safe? If you silence a gene, will that maybe cause some harm because you need the protein of whatever. The researchers are not there yet but at least they have an anchor for future development.
The project collaborates with many foundations over the entire globe. Many patient foundations collect money of course through donations and all sorts of campaigns. The project really tries to bring investigators and foundations together, to help them convince that this is a relevant project so the partners can raise the funds to really complete this project in due course.
More information is available at the MinE project website.