An evening with Thomas Sterling: Exascale taking shape at a global level and down the rabbit hole of loop quantum gravity

21 Jun 2017 Frankfurt - Right after the Award Ceremony of the HPCAC-ISC Student Cluster Competition 2017 at ISC, Session Chair Frank Baetke introduced Professor Thomas Sterling from Indiana University to the audience that was ready for the 14th edition of the legendary "HPC Year in Review" series. As the 2017 theme, Thomas Sterling had picked "Planning for Exascale" instead of machine learning or Deep Learning, explaining that we now have a fairly clear picture of what exascale is going to be.

Thomas Sterling insisted that he made no claims that his overview was purely objective or in every respect actually factual but on the other hand he stated that he was not selling an agenda. He admitted that he tends to like wires and hardware a little bit more than software, all but too well realising that software justifies building and paying for the machines.

Thomas Sterling chose what he thinks represent the elements of the overall trends that we are facing. One of those elements is the strong clarification and redirection of where the HPC community is going. It is both expanding through the application to data sciences with machine learning and to other aspects of artificial intelligence, as well as concentrating on very large complex phenomenology and detailed numeric processing. Thomas Sterling thus selected the concrete plans for Exaflops.

He exclaimed that the 100 Petaflops are here and once again congratulated the colleagues in China with their TaihuLight machine that has an original architecture, not being a copy of anybody else's design and entirely homegrown. This, he insisted, is still a remarkable accomplishment, more so because of the applications that they have running on it.

A very significant event, that continues to be a positive trend within the HPC field, is in the area of energy efficiency. This is something critical, according to the speaker, because if we are going to get to and beyond Exaflops computing, we have to bring the energy efficiency up by substantial degrees.

Graph processing, in Thomas Sterling's view, is the really big hurdle in front of us. It is now being taken seriously as a complete paradigm shift. For 65 years, developers have been dedicated to matrix computations with hardware support for indexing and vector processing to pipelining. Now, the data structures are immersed in metadata but the hardware doesn't know how to use that. Many different companies in multiple nations and many different academic and industrial institutions are applying graph processing and learning how to expedite the accelerating of that. Thomas Sterling thought this is very important and hoped to say more about this next year.

As for machine learning, Thomas Sterling believes that this is a field in flux and more or less a field in hype. He said he has lived through too many of these moments, which doesn't mean that it lacks probability, importance or impact. Yet, he thought it has been overstated in terms of its dominance in the area of artificial intelligence.

Thomas Sterling especially mentioned Hewlett Packard Enterprise (HPE) as a company that has experienced an important transition. They have now merged with SHI and this is a significant change. The company has always been successful in the field of HPC, always running number one or number two in the number of deployed systems, developing really remarkable and beautifully engineered machines. Now, finally, the company has shown the courage to move up to the front row where they are competing to make contributions in the area of exascale, not waiting for everybody else to do it and then come up with something of about an order of magnitude beyond. Thomas Sterling thought that it is very important to see this change and it's wonderful to have another major engineering operation that is competing and pushing the true edge of the envelope.

Thomas Sterling never thought he would be talking about quantum computing in front of this HPC audience. He could remember standing in this very spot and trying to explain quantum computing which was something like teaching computer science at Harvard's. He joked that it is easier these days to work with something that you don't have to understand at all.

Thomas Sterling also mentioned the most important scientific breakthrough over the last year which has only been possible, in part, because it was being enabled by supercomputing but this important breakthrough actually didn't happen. He promised to finish his talk with that and smiled mysteriously.

Thomas Sterling briefly talked about the TSUBAME3.0 architecture. The team at the Tokyo Institute of Technology, led by Satoshi Matsuoka, has deliberately and consistently forced the practical edge of the envelope and architecture to bring up the energy efficiency. The TSUBAME3.0 system, which has very complicated interconnects; a balance of processors, coprocessors and accelerators; and a surprising cooling system, was announced as the number one machine for the Green500, showing a massive power ratio of 14.110 gigaflops/watt. This is a really impressive number. Thomas Sterling remembered when it was really hard to get 1 gigaflops/watt for a system. This is more than an order of magnitude more. In fact, it is less than a factor 4 away from what we need to achieve an Exaflops computer. From various talks given by Erich Strohmaier, we can see that breakpoint and this successive progression to which the people at Tokyo Tech have been a major contributor.

If you look at the Top500 chart you can't help but notice that there seems to be a continuous story about how to achieve this. Almost each of these platforms is taking advantage of the NVIDIA Tesla P100 system. This is demonstrating when you have hardware that is carefully aligned in the proper kind of cluster that follows the workflow of the data path - so you don't have to use intermediate buffers which are energy consuming as well as time consuming - then you can get for those idioms, those algorithm elements, superior performance, and also superior energy efficiency, Thomas Sterling showed.

Looking at the uncertainties and the complexities, what Thomas Sterling has seen this year is a commitment internationally towards exascale computing, not for the sake of pride and stature, or to make a claim on some particular squiggly chart. It is because we understand that in the new society that we live in, which is scientific, which is economic, which is social, and which is coping with defense and security, the highest end of computing helps possibly in all of those ways. Thomas Sterling presented five factors, that are all shared by every institution and every government agency around the world who are looking to make a useful and accessible tool and to train us, but more importantly, to train the next generation to be able to apply those tools to the future.

In the US, Thomas Sterling was part of a committee that identified the top 10 technical challenges for exascale. These challenges are not in any case ordered by some priority because they are so interrelated, so cross-coupled that no one of these are good without the other ones. Thomas Sterling wanted to highlight the importance of scientific productivity. He didn't know how to measure that. He has tried. He has worked with equations, he has published equations but he doesn't know what the unit for scientific productivity is and, yet, that is the N factor.

A the top of the list is energy efficiency. Thomas Sterling knows the person who proved the standard, the requirement, the threshold of 20 megawatts for an Exaflops computer. Someone from the industry in his room said: 'No, that's no good'. They said it was maybe 20 to 40 megawatts but the truth is that 20 megawatts is too high. According to Thomas Sterling, it is not too low because we don't want to build one-offs. He said that we want everyone with a machine room to have an exascale machine so that they can all do the breakthrough science, engineering, social requirements, and defense as well. Thomas Sterling gave credit to the Exascale Computing Project (ECP) for the slides he showed at this point of his presentation. They highlight the intentions, the philosophy and the goals of the US Exascale Computing Project.

He presented the simple chart that shows the importance. Simultaneously, the team uses co-design of combining a vast development in applications, in software and in hardware technology together, to create ultimately the exascale systems. For how long has it been that mostly we've had hardware thrown over the fence at the programming people and they sort of had to figure out how to make it work, Thomas Sterling asked. The ECP initiative is important because it tries to move the bar on all of these together forward. This is right and more advanced philosophy. Thomas Sterling said he loved the chart he showed to the audience because it shows how the team increased the Linpack performance. It shows the improved advances in hardware and software technology. Initially, back in the days when moving from Gigaflops up to Teraflops in 1997, this was mostly from the actual technology advances, known as Moore's law. However, Moore's law also did require an increased density and an increased degree of parallelism. There's a lot of us throwing error bars on these. This took us for a long line down the exponential slope.

As we got past teraflops, we found that we were getting less performance. Around 2005, we had to move to the new usage of multi-core which means an added amount of parallelism and passed at that point, we are now dedicated to working towards the magnitude of parallelism we expect with less than a factor of 2 from the actual hardware delivered, Thomas Sterling explained. We are truly at the end of an exponential growth of Moore's law and we face new challenges in being able to increase the delivered performance on ever more complicated applications. Yet, for some of us, this is perhaps one of the most exciting times for system development in architecture, and perhaps even programming models since the 1980s, he said.

There's a wide range of key domains of applications that the exascale project is focusing on in order to advance. Their true measurement of success is the success and the achievement at the science level, at the end result level, not necessarily the particular values. Thomas Sterling mentioned applications related to wind turbines, solar energy, and climate change simulation.

Thomas Sterling turned to the software stack. He showed the ECP's particularly insightful software stack. There is a new relationship between the node operating system and the system operating system and appreciating and understanding that in a relationship to achieve in combination, in synergy, the improved result. This also points out the importance of math libraries and brain work. These are so important because user productivity as well as performance portability across a diversity of machine types, of machine generations and machine scales are, to a major degree, going to depend on the achievement of math libraries that themselves are portable, making it much easier for the user. This is the path to success. People say that they're willing to change once. They mean that they are okay; they are a community; they are not scared of the future but they are not going to waste their time either. This kind of software stack, in relation with the elements, demonstrates a path to achieving that goal.

One example of the way the US is going to exascale is through the stages of interim machines in the regime of hundreds of Petaflops such as the Coral systems. They will have three intermediate machines. Two of these will be Summit and Sierra at Oak Ridge National Laboratory in Tennessee and at the Lawrence Livermore National Laboratory in California. Thomas Sterling showed the Aurora machine to be deployed at Argonne National Laboratory. These are not all the same machines. Yet, the first two are based on the IBM Power 9 processor

which is absolutely heavier. The Power series have been an elegant architecture, with added to that NVIDIA acceleration. Oak Ridge will probably be the first point of deployment some time towards the end of 2018. Aurora is the counter design, built on fine grain processors based on the Intel's Knights Hill. All these machines will be in the two hundred Petaflops range of performance somewhere between peak and delivered, Thomas Sterling said. These will give us an enormous amount of information as the US considers these machines shortly thereafter will be in the 2022 to 2024 regime at different stages of deployment.

The Europeans have a very firm plan in how to advance towards exascale, Sterling noted. They are setting up Centres of Excellence in a number of domains of science and engineering. They are very much investing in the future of system software frameworks for environments to make them readily available to the computation scientists and end domain scientists, recognizing the need to advance concurrently in the areas of data intensive programming, tools, algorithms, mathematics, memory, and storage. Thomas Sterling moved to the area of hardware, showing a slide which had in the upper left-hand corner HPC system architectures. When you look at the overall history of the European Union, you see tremendous advances in Clouds and data processing, in programming environments, and especially in applications. A billion euro has been committed to graphene and a billion euro has been committed to brain modelling and yet, you haven't really seen that kind of energy or enthusiasm committed to the sometimes thankless task of trying to wire processors together.

However, there are rumours that there are going to be two different classes of machines developed for exascale. For reasons Thomas Sterling does not understand, France seems to have its name tagged on to these machines. All he could say is to stay tuned to next year when he hoped he would know more about it.

Thomas Sterling expressed his apologies to his colleagues in China. You won't talk, he said, and he wanted to know what they are doing. He said he understood: they are number one; they have been number one; and when they weren't number one, they became number one. He showed his appreciation for these accomplishments but would like to know what they are doing. China has announced that it will deliver and deploy its first exascale machine - which Thomas Sterling thought one could simply refer to as THE first exascale machine - in the year 2020. This is not just bragging: you can take these guys and women seriously. They are not fooling around. The importance of the Sunway machine is that they designed it

from the sand up and they are continuing to expand their own architecture designs and fabrication capabilities. For example, the Matrix2000 GPDSP 2.4 Tflops double precision is a specialized processor. There are a number of key players: Inspur, NUDT and Sunway are just among the few. Thomas Sterling has been finding out a rapid increase in the number of application programmers that are using these systems and the number of domains which are increasing in which they are using these systems.

Japan, you've got to hand it to them, as Thomas Sterling expressed it, when they build a machine, they build a great machine. He could remember being in Japan with Jack Dongarra and lots of other people, to see the Earth Simulator. These machines are well built. Their networking balance between latency and bandwidth is the best we can get in supercomputing. This is also true with the delivery of the K machine five or six years ago. There is a rapidly increasing and firm commitment to the achievement of implementing AI. AI can be used to mean a lot of different things. Thomas Sterling reached back at a distant past in the 1980s, the Japanese had the 5th generation computer project, in which they invested heavily and in very clever ways, to explore or expand on the ability to bring smart computing to bear to real problems. There has been a long, stable and growing project in the applications of artificial intelligence and feeding the lessons learned back into both the software systems including programming models and to help drive instructors of future architectures that are happening.

Thomas Sterling showed a slide of the post K which is the next system that the Japanese will be deploying. This too will serve as a prototype of their exascale system. If you look at the specifications they're going to be within a hair's width away, at least in peak performance, to achieving exascale itself but they're not going to do that.

Thomas Sterling went over to the "In Memoriam" part of his keynote, stating that it is ever so important that we take a moment to acknowledge the passing of some people who have truly contributed to where we are, even if we never heard of them. He said that the audience probably never heard of Herbert Richman. In the 1960s and 70s, there was a very different and exciting world with the invention of the mini-computer, that has told us that computing was approaching being for everybody, as opposed to the six machines that Thomas Watson thought the world would need. The mini-computer was hotly contested. The two major companies were Digital Equipment Corporation and the company that Herbert Richman started in the 1960s, namely Data General. Thomas Sterling suspected that the audience might not have heard of Data General but at that time, Data General was cool, he said. A very young Steve Wallach worked for Data General. The company had many successes and many failures but it was at a time when people could take risks. Thomas Sterling was pleased that he could take a moment to acknowledge Herbert Richman's many contributions.

Thomas Sterling thought the audience might have heard of Charles P. Thacker. He was with Xerox and contributed to key networking and distributed computing. He was a driver of the first experimental tablets. He built the first hardware for Ethernet. He did the implementation and the implementers are in control because they determine success and failure. He was a key player in the development of the Xerox Alto, a real pointer to the future. He was posthumously awarded the Eckert-Mauchly Award, the highest award that you can receive in artificial intelligence.

Thomas Sterling went over to the people who have been awarded in the past year. The first one is William Camp who ran an operation at Sandia National Laboratory that resulted in many different things. One can also read the assertion for visionary leadership of the Red Storm project and four decades of leadership in high-performance computing. The Red Storm project became the prototype for the renaissance at Cray because it became the Cray XT3, which led on to a various family of high-performance computing. William Camp has won the Seymour Cray Award.

Vipin Kumar received the Sidney Fernbach Award for foundational work on understanding scalability and highly scalable algorithms for graph partitioning, sparse linear systems, and data mining. Vipin Kumar is always focused on the next problem, and always mixes practical outcome with theoretical insight, Thomas Sterling testified.

William Gropp is the winner of the Ken Kennedy Award for highly influential contributions to the programming ability of high-performance parallel and distributed computers and extraordinary service to the profession. If Thomas Sterling has ever read an acolyte, this is the most understated one, he said. Everyone of us who touched a supercomputer in one way or another, is a benefactor of the many decades of contributions that William Gropp has made. The ACM Turing Award went to Sir Tim Berners-Lee for inventing the World Wide Web. All four winners have made our world and the profession a better place, as Thomas Sterling stated.

Thomas Sterling said he understood that there were students present in the room who were participating in the STEM Students organized at ISC'17. He wanted to praise ISC for starting this activity to attract students into the world of HPC through the experience of ISC, to expose them both to the technical skills that are involved in making people understand that these skills are usable, and at the same time telling them about careers that are available in HPC. Thomas Sterling expressed his congratulations to organizer Nages Sieslack and his thanks to the many sponsors who funded this event. He said that it is a big responsibility to prepare the next generation for HPC.

Every year, there is some analysis of the field. Thomas Sterling borrowed the material from Jack Dongarra, Erich Strohmaier, Horst Simon. It is important to do the data mining within the Linpack benchmark. It helps us to appreciate the market and design challenges, according to Thomas Sterling. What he discovered was that he can really consider it, not as the TOP500 list but as three basically different worlds of supercomputing. 90% of the machines are practically within a factor of 2 to 4 of each other in performance. They're almost all at the same level. Then there are the high-end machines which are in fact those machines that we talk and write about. We talk about the Top 10. We talk about exascale but why are we not talking about the memory capacity, Thomas Sterling asked. It is half the price of the machine. Why are we not talking about the memory bandwidth that is in fact the principal performance triangulation to achieving deliberate performance.

Thomas Sterling showed a very consistent model for the memory size. The amount of memory at the top of the list is much larger than the memory size across the rest of the world, including the mainstream or the long tail. When you look at it from a different way you find that in most cases, there are smaller machines that have a better and healthier ratio of the amount of memory you have, given the floating-point operations that you can perform at peak, according to Thomas Sterling.

Thomas Sterling also in short addressed machine and deep learning, as well as brain-inspired computing. He said he just came from a conference in Italy, entitled "Brain-Inspired Computing". This involves many things: trying to understand what the real brain is and also trying to be motivated and informed by the structure of the brain in making new generations of computers. The question is: Why are we so fascinated with the human brain? There are a few major brain exploration projects. In the European Union, there is the Human Brain Project of 1 billion euro. Earlier on, there was the Blue Brain Project, a Swiss initiative together with IBM, which still is running and getting money. In the US, there is the BRAIN initiative, principally run by the National Institute of Health, with overlapping but really different emphasis. The BRAIN initiative is about emphasizing the medical implications and opportunities of understanding the structure of the dynamics and the chemistry of the human brain for addressing problems such as Alzheimer's Disease and many other brain diseases.

In doing the simulation focusing on the brain, there are two general approaches. One is having as many surrogates for the neurons as you can and then have the largest possible network that you can. The assumption here is that the brain functionality is primarily represented by how the networking, the synergy of the neurons is organized. The other approach is modelling with the highest precision what the neuron is, assuming the exact functionality that it ultimately determines, in partnership with other neurons, and look at that. Both of these approaches are very important today. But then there is the brain-inspired model. Here the question is: how do I build a modern machine? IBM, this year, has announced its work on TrueNorth which has the equivalent of a million neurons and a quarter of a billion synaptic junctions with five billion transistors on a chip. TrueNorth uses the Non Von Neumann programming model to make it easy to set this up and then it can be applied to 3D real world applications, Thomas Sterling explained.

Finally, Thomas Sterling wanted to say something about the greatest HPC enabled scientific discovery that didn't happen the past year, that is going to have an enormous impact in us understanding the reality of which we are a part and that is likely to change the course of theoretical physics that has gone on for the last 10 to 15 years. Thomas Sterling quickly mentioned CERN's search for Higgs boson and the big instruments used in the Atlas project which is producing gazillions of bytes of data. Computers over 167 different sites are analysing this data. With Titan alone in the US, one is running hundreds of millions of processor hours per week for the data analysis of the Large Hadron Collider at CERN. This is as much a computing project as it is a giant synchrotron project.

Thomas Sterling then showed a slide with the very big discovery that we do not see. There must be a bump on the yellow line that is going up into that space which is a vacuum. Scientists used a lot of words because they didn't need to do anything out there because there wasn't anything else. What should have been there was some change in the asymptote, the lower half, plus some bumps, because there were supposed to be masses identified that reflect supersymmetric particles. Without supersymmetry, a whole space of modelling, theoretically known as string theory and superstring theory, can't be right. What this means is that we didn't discover it and because of this negative result, this not-confirming result, this negative result tells us we have to look elsewhere and that elsewhere is probably loop quantum gravity. Loop quantum gravity has one lovely property and that property is it actually helps a coordinator combine both general relativity and quantum mechanics because it defines a finite granularity to a gravitational field which is the definition of space itself and quantum time. Together, a lot of infinities go away. Loop quantum gravity is supported by the lack of supersymmetry. Nothing could be more profound, more fundamental as we inch our way forward to an understanding of the reality of the cosmos and our place in it than a negative result that fails to confirm the expectations, Thomas Sterling concluded.

Leslie Versweyveld