He explained to the audience that in computer vision it is hard to locate objects. The camera only sees pixels. He also compared deep learning with the neurons in the brain. Deep learning works with a neural network, inspired by the human brain but at a much lower level.
A neural network is a vector representation. The network computes a series of vectors and is associated with diagrams. They are learning algorithms. Supervised learning consists of learning from tagged data. With input X the observer asks: is this a coffee mug? The image Y responds with yes or no.
Andrew Ng has asked himself why deep learning is now taking off. It is like building space rockets. You need an engine and you need fuel for the engine to be launched. Both the engine and the fuel have to be huge. In analogy, for deep learning one needs large neural networks and a huge amount of data.
During the past 10 years computation has scaled enormously from a CPU with 1 million connections in 2007 to a GPU with 10 million connections in 2008; from a Cloud, consisting of many CPUs, with 1 billion connections in 2011 to HPC systems, consisting of many GPUs, with 100 billions connections in 2015.
In order to train a model, as Andrew Ng explained, you need 10 exaflops, $100 of electricity, and 4 TB of data.
Andrew Ng has been involved in speech recognition and put if forward as an example to the audience. In speech recognition the data is audio, evolving from audio features to phonemes, to be transformed in a language model, and finally into a transcript.
The phonemes look like this: de kwik brawn foks.
The final transcript has turned it into: The quick brown fox.
For deep speech one needs end to end learning. The speech dataset comprises 45,000 hours a day. As speech recognition performance evolves, the error decreases. Most people however do not understand the difference between 95% accuracy and 99% accuracy, Andrew Ng explained, but, in fact, the 99% accuracy is game changing. The goal is to have speech communicate with devices.
Baidu speech recognition has seen a 3-fold growth since January 2015 in usage. The major trend no. 1, according to Andrew Ng is that scale drives AI progress. On the X-axe is shown the amount of data and on the Y-axe is shown the performance.
With a traditional algorithm you have no growth in performance but if you turn a small neural network into a medium one and finally into a large one, you get bigger performance.
Andrew Ng tries to invent a basic recipe for machine learning. Therefore, he asks himself: Does it do well on the training data? If the answer is no, you have to build a bigger neural network. Then the question is: Does it do well on the test data? If the answer is no, you have to get more training data. Scale is needed to build a bigger network. A bigger network needs weak scaling.
Andrew Ng invited Bryan Catanzaro on stage to talk about the new technology. Bryan Catanzaro explained about scaling up the training. Strong scaling is needed for the data. This is hard. Bryan Catanzaro showed that all went well up to 60 processors but then the line flattened. The amount of parallelism is reduced. On the GPU core, the data is loaded at 54 ns (0,2%). The load parameters are 20µs, being 85% of time loading parameters.
Andrew Ng said that one needs persistent kernels for recurrent neural networks. You need to map a subset of neurons to a single core. Then, you have to compute about 97% performing map operations. The load parameters consist of 1,5%.
However, persistent kernels are difficult. This needs careful balancing of several performance limiters, including the communication latency, the barrier latency, and the memory bandwidth. There are limitations to the technique. You have to hold all the parameters on chip. Andrew Ng said that persistent kernels enable strong scaling.
The traditional CPU architecture for inference (deployment) has an input of 4x1 vector, and an output of 2x1 vector. It is difficult to scale. The batch dispatch has GPUs for inference (deployment) and a GPU/SIMD processor. Baidu team player, Chris Fougner, has put great effort into this. Andrew Ng explained that there are 7 times more users with the same latency.
Major trend no. 2 is about learning complex outputs, as Andrew Ng stated. E-mail turns into spam on a 0/1 basis. Online advertising results into clicks on a 0/1 basis. For an image to result into a recognized object, it takes an output of 1, .... , 1000. The major limitation is that this needs lots of labeled data. Andrew Ng said that it is about learning to caption using recurrent neural networks
He asked the audience to vote on which of the following industries will be most affected by AI in the next 5 years:
Andrew Ng provided some examples.
In the DuLight project, Andrew Ng explained, one is helping the visually impaired. This project incorporates computer vision, face recognition and speech. In health care, it is about helping doctors and patients. You start with a free text entry consisting of a question and symptoms. You submit the query. A doctor will respond in 10 minutes. The suggested reading is alternative.
In autonomous driving, you start with small "autonomy enabled" regions, and you grow those regions. Modest changes are made to the road infrastructure. In the end, you have visually distinctive cars and you can set appropriate expectations.
There are many more examples such as web search; advertising; consumer finance; fraud detection; optimizing on-demand services logistics; predicting data centre hardware; failures; malware detection; identity verification (voiceprint, face), and so on.
Andrew Ng did another test with the audience and asked to vote on which of the previously mentioned industries will be least affected by AI in the next 5 years.
One should definitely use AI to help people and businesses, according to Andrew Ng. There are several upcoming trends in AI.
In the past, you hired a VP for electricity to "sprinkle on" some electricity". This consisted in adding individual electric motors in order to finally redesign the manufacturing plant to use electricty.
In the same way, you can hire in the short term a VP for AI or a chief data officer and build a centralized AI function to help the company. In this way, you can "sprinkle on" AI to the existing businesses. In the long term, you have to deeply incorporate AI into the business.
It is all about network architectures and developing new applications. However, modifying network parameters and black magic are not good methods, Andrew Ng warned the audience.
Andrew Ng has published a book about his findings which is titled "Machine Learning Yearning". More information on it can be found at http://mylearning.org . If you sign up by June 24 on the website, you can get a free draft copy.
Andrew Ng ended by saying that the fastest supercomputer is 10,000x faster than a single GPU. He hopes that AI will help drive HPC research and pleaded for better communication routines, including all-reduce and point-to-point, and for better programming models by improving the MPI.
Andrew Ng stated that AI and HPC are superpowers, so learning how to use them is paramount.
To learn about artificial intelligence, there are coursera at ml-class.org