Karlheinz Meier asked why a computer scientist would get involved with the brain. There are two reasons. One is to support science in understanding how the brain works. The scientist can do this by using experiments or computer simulation but this is very hard and maybe even impossible. The second application of brain-inspired computing is to take the computational principles of the brain to perform generic data processing and to do what the brain is really doing well. The scientist can find patterns in space and time to make predictions. This is what is typically called cognitive computing. This is probably the most exciting application in the long term of neuromorphic systems, according to Karlheinz Meier.
Sejnowski, a famous neuroscientist, calls the brain a modular-scaled system. There are things happening in different states of the brain. These are physical states which you can observe under a microscope. There are seven orders of magnitude in spatial scale and eleven orders of magnitude in time. Brains consist of neurons that spike. The time to learn things and to self-organize amounts to months and years. Neuroscience has developed visualizing methods to map the function of the brain in space and time.
The brain has neurons which are working with power so it is easy to map the brain to a supercomputer. The K-computer in Japan is simulating 1 billion very simple neurons on 65,000 processor nodes. The network size is just a couple of neurons. The concept of weak scaling works extremely well here, Karlheinz Meier explained. When you increase the number of compute nodes and the size of the neurons, you get constant performance in terms of computing. This is not entirely true because there are quite some deviations from big scale. The absolute number of the runtime of 103however is more interesting. This is a problem for two reasons. The system is 10 billion times less energy efficient than the human brain and you have to wait four years for a simulated day. The timescales are simply inaccessible on conventional computers. This will not change if we are moving to Exascale, Karlheinz Meier pointed out.
An alternative for this is neuromorphic computing. Some people say that neuromorphic computing is the building of artificial brains but this is nonsense, according to Karlheinz Meier. We are not able to build an artificial brain because there are so many aspects of the brain that we don't understand. We can however implement some known aspects of structure and function of the biological brain and put this as analogue or digital images on electronics substrates. By structure Karlheinz Meier means cell cores, networks of axons and dendrites and connections or synapses. Function however is far more important. There is local processing going on in the cells. There is communication via the ingoing and outgoing fibers. There are dynamics by the process of learning, plasticity and self-organisation.
Karlheinz Meier showed a concrete example from the Human Brain Project, called the Neuromorphic Computing Platform. At this moment, the project has two machines which are operational. These are two fundamentally different approaches to brain-inspired computing. The first one is the SpiNNaker system at the University of Manchester which consists of half a million of ARM cores. To overcome the weak scaling problem, one has invested in routers on each of the chips. The other project is the Physical Model system BrainScaleS consisting of physical models of cells, neurons and synapses for local analogue computing. The system is binary and continuous in time. It is located at the University of Heidelberg.
The SpiNNAKer has 18 ARM 968 cores per chip, is integer arithmetic, is running at 200 MHz processor clock, and is a shared system with RAM on die. The system has bi-directional links to other chips. The packages are small and optimized to transmit biological spikes. The system can transmit 6 million spikes per second per link. It is a real time simulator and has a drastic approach to weak scaling. BrainScaleS is a mixed-signal analogue-digital system. The VLSI has very small capacitances compared to the biological system. This means that you have time constants which are accelerated compared to biology. It is a linear circuit which is very simple. However, neurons are highly nonlinear. The incoming synaptic signal is not just a spike and there are delays in the system. There are many time constants. If you work hard, you can build a system where all the time constants are multiplied by the same factor and you can make that factor many times smaller than 1. In this case, it is 10,000 so it runs 10,000 times the speed of biology. Time is imposed by internal physics and not by external control, as Karlheinz Meier explained.
The next step is to build systems and both systems have been released to the public since March 30, 2016. You can use these systems and even remotely access them. They look a lot like little supercomputers but they are not. These are really fundamentally different computing architectures. Can you do useful computation on these systems? Most of the networks that we use today are deterministic: they have an input pattern and an output pattern linked by the network. The network is deterministic. If you repeat the experiment, you will always get the same result. In stochastic networking, you have distributional patterns. You store stochastic distributional patterns which reflect your prior knowledge. You can use those stored patterns to either generate distributions or you can do inference and discriminate. You present a pattern and say: "I've probably seen it before". The system samples from probability distributions that are stored in the system, Karlheinz Meier explained.
Karlheinz Meier and his team have done some experiments and he showed the audience three examples. One is deterministic supervised learning. An insect uses his chemical senses to distinguish different flowers. When you are a scientist, this is reversed engineering. The system has receptive neurons which respond to something. There is a correlation layer which serves basically for contrast enhancement. The inputs are combined to take a decision on the kind of flower, for example. This is a data classification. The trick is to configure the link between the two layers. This is done by supervised learning. The intermediate layer connects the association for the input. There is also spiking activity but it is sparse. This is the reason why nature has invented spikes. It saves energy. Where interesting computing is being done, the firing rate is very low. The output layer shows results before and after training. After training, peaks are showing up, Karlheinz Meier indicated.
A second example constituted deterministic unsupervised learning. The owl compares the sound inputs between its two ears to locate the mouse. The model is very straightforward. If the mouse is on the right side there is a short flight path for the sound to the right ear compared to the left ear. If you want to detect the phase difference between left and right, you do it by a circuit internally. If you look for coincidences you do this by compensating the short path in error by a long path in the brain. You detect time coincidences between two impulses. If those time coincidences are there, you produce a stronger signal. This is done in a completely unsupervised way. One can see how the synaptic waves evolve. The analogue synapses are imperfect so there is variability. The synapse features in the circuit vary about 20% to 30%. Phase detection can be done to 10 nanoseconds.
The third example was about stochastic supervised learning. Karlheinz Meier showed an image of an animal which could be a duck or a rabbit but you can only see one animal at a given time. There are stored probabilities in the brain and one can take samples. This can be implemented with spiking Boltzmann machines. It are networks of symmetrically connected stochastic nodes. The state of the nodes is described by a vector of binary random variables. How can spiking neurons be binary? The neuron can be active or refractory. The probability of the state-vector converges to a target Boltzmann-distribution. There is an energy in the exponent of the Boltzmann distribution. This is the classical energy factor. There is a very well established mechanism to train this. You clamp the visible units to the input layer to value a particular pattern and reach a thermal equilibrium. You increment interaction between any 2 nodes that are both on. This is the small learning process. It takes a long time however if it is implemented on conventional computers. You can run the network freely and sample from stored probability distribution or you can infer the input from certain clamped input variables, Karlheinz Meier explained.
Karlheinz Meier showed a nice example of a real hardware experiment where one can see the time advantage. It is happening in milliseconds. The systems that are used are not yet optimized for energy efficiency but they are already performing pretty good. One can measure the energy used for a synaptic transmission. It is typically 10,000,000 times more energy efficient than state of the art HPC but 10,000 less efficient than biology. The most important aspect of neuromorphic computing is the speed advantage using accelerated models. If you look to nature, there are time scales from sub-milliseconds to years, going to 12 orders of magnitude. If you run this in a simulation that runs a factor-1000 slower than in reality, you can easily look at synaptic plasticity. It is clear, however, that learning and development in particular is totally inaccessible for a computer. It would take thousands of years. If you have a real time system it will take years. You can do this if you perform robotic experiments but Karlheinz Meier said that this alternative of using the accelerator model is interesting because you can compress one thousand years to a couple of seconds.
Karlheinz Meier concluded by saying that after 10 years of development, available hardware systems have reached a high degree of maturity, ready for non-expert use cases. There is a high degree of configurability with dedicated software tools, but obviously no replacement for general purpose machines. They are restricted in their use to emulate spiking neural networks. The only way to access multiple time scales present in large-scale neural systems, is to make them functional. You have to let them learn. This is well suited for stochastic inference computing and well suited for use of deep-submicron, non-CMOS devices because there is resilience and there is a way to compensate reliability. To evaluate the architecture, CMOS is perfect, according to Karlheinz Meier. Once non-CMOS devices come up, which are not available today, this would probable be a very nice use case.
The workshop is covered in full in five articles: