What do ecology, psychology and medical science have in common? And what digital technologies do we need to engage in those shared challenges, how do we bridge gaps between disciplines, and how do we combine expertise from different domains with expertise from computer and data science?
Starting the day with a talkshow illustrated the potential as well as the difficulty of bringing such diverse fields of research together. Ecologists and meteorologists discussed the possibilities of sharing data. Communication challenges also quickly became apparent when psychologists and computer scientists started talking about language: "Are we talking about human language or programming language?" - and different interpretations of concepts: "What is the difference between text mining and natural language processing?"
It shows the value of organizing a day like this, where participants can try to work on shared understandings, break the jargon barrier, and inspire each other with unexpected perspectives.
After many discussions, six topics were identified as relevant for a follow-up colloquium:
1. Deep Learning in Science
Machine learning is a fast-growing and exciting field in research, and deep learning represents its state-of-the-art. Machine Learning and Deep Learning involve feeding a computer system a lot of data, which it can use to make decisions about other data. Deep Learning enables many researchers to scale up their machine learning in ways they couldn't do before.
That opens new avenues to ask new questions in many fields. When there are too many features to pre-code in your models, and you want to explore the field, Deep Learning saves you time for the creative part of science instead of spending too much time on the mundane technical data.
What do we need to make progress with the application of Deep Learning in scientific research? A great start would be a symposium for various fields of science that can profit from Deep Learning. Because it's so fast, and so new, it's very hard to stay up to date with current developments. You need to discuss new technological developments, but also new applications. It would be very useful to share prototype implementations in various fields so you can see commonalities and differences. And to share what is the best learning material to get you started.
We need MOOCS and not books - we need online material that can change as the field changes. And we need access and examples of how to gain access to infrastructure with specific support - for example with GPU clusters but also with high-speed networks.
2. Data analytics
We use models to make predictions about the future. For example, to fight or prevent poverty, or to detect where slums are evolving in cities. To develop and analyze these models we need a set of tools.
However, there are so many methods available that we are blinded by the complexity. And we do not always know what the real truth is - because what is the truth in the future? We simply cannot evaluate these methods, and do not know which is the best one. We have so many different fields of expertise, but there is a gap between those fields which is preventing us from combining methods it in the best way.
A course of action could be to start bringing those diverse fields of expertise together, to develop the communication between those fields. Not only between technical and scientific aspects but also between these and methodological aspects. We need communication beyond documented coding - collaborating with each other, also on an international scale, and with companies.
Visualization is a way to simplify complex data and make it more attractive to people, and therefore very important not only to go from data to information but also to inspire.
A big challenge is for domain researchers to 'trust' the visualization. In any visualization choices are made to translate a set of data into a visualization that is more easily interpreted - inherently a process in which data is manipulated.
It is important that people are educated so they understand how visualizations come about. At the same time, it is important that researchers realize what is possible by using visualization as a tool - because there is so much potential.
Visualization is very difficult to generalize. Each research question requires a different kind of visualization. To get the most out of this technology, it is therefore important that researchers are aware of the possibilities. That enables researchers to communicate their wishes, and makes it easier for computer scientists to understand those wishes.
A symposium would be a great way to show domain scientists the potential of translating their complex data into a 'simplified' visualization that helps to interpret the data and inspire other researchers.
4. Multi-scale modelling
Going from very small level to a very high level: from cells to society, from butterflies to global climate, from the Universe to a screen: multi-scale modelling is trending.
Why do we need multi-scale modelling? One reason is because we want to take short-cuts: We cannot compute whole systems at the lowest level of detail. Another reason is that we have data at all different levels now. This is new. And we have compute systems at different scales.
There's a good case for multi-scale modelling. The questions we asked ourselves is: Are there generic aspects that are true for multi-scale modelling in all the different domains? For example, can we find generic rules for how to separate different scales? Can we define best practices for the interfaces between the different scales? Do we have ways of validation for these complex multi-scale models? Do we know how to map the multi-scale models to a multi-scale complex compute infrastructure?
There are many more questions than answers at this moment. That's why there is really need for a workshop to work this out. The eScience Center want to bring together scientists from different domains to sit together and see what the properties of their multi-scale models are and extract generic aspects that the eScience Center can solve.
5. Data integration
Data integration allows users to see a unified view of heterogeneous data. It involves combining data from several disparate sources, which are stored using various technologies. Data integration is becoming essential to do science.
There are four things to consider: different formats, different modalities of data, ontology - a set of concepts and categories in a subject area, and also linked to that epistemology - different ontologies in different communities.
Data integration challenges can be illustrated by the following two 'billion dollar research questions': 1) How can you use allele and gene information to predict the size and also the robustness of crops or plants? 2) How can we combine biological activity and chemical structure information to predict polypharmacological action of drug molecules on multiple protein targets? The participants tried to find some common themes between the research questions.
The particpants realized initially that they have different problems. Namely, for the polypharmacological question they had the problem that there were certain data integration tools not available that they need to invest in, while for the crop prediction the whole infrastructure and ontology has to be developed and there are a lot of epistemological questions to deal with. They realized when moving to a cell based system, polypharmacological questions would have a similar problem in terms of infrastructure and ontology.
That's why there is need for a symposium where they want to have case studies presented that reflect not only the challenges of this essential work but also the success stories of the science to convince people of the urgency of tackling this challenge. Based on the different aspects of data integration they will come to discussion groups. And then in the end, if one gets to speak the same language on either one of those elements, that will be a big success.
6. Tools and access to e-infrastructure
The case for a symposium on streaming data is very compelling. Imagine, you have an up to date view of all your data as it streams in. The participants in this session all shared a dream. All work with networks of sensors - be they wearables, weather buoys to collect weather and ocean data for climate research or antennas to study the universe. And it turns out that one needs to study the data of these sensors continuously as they come in. The participants found out that they need to identify which of this real-time processing can be done by software and which by hardware. Sometimes there is so much data that you need to get it reduced in seconds.
Some of it you can run on commodity hardware, others need specialty hardware. But the users don't want to know. They need this hardware to be fault-tolerant, sustainable, and they need it to have reasoning build in. Therefore, they need a one-day workshop in which they are going to build enthusiasm for a flagship project on stream reasoning.
The topics were discussed in in-depth sessions, after which each in-depth group presented a short pitch on why the eScience Center should organize a colloquium on this topic. While each topic is absolutely worth a colloquium, the most suitable topic was judged to be "Visualization".
More information is available at the eScience 2017 website.