Why IBM can be found in the upper region of the TOP500

4 Jun 2010 Hamburg - IBM can always be found in the upper region of the TOP500. Hence we talked to IBM's Klaus Gottschalk to find out what are the new, important developments in HPC data processing - and what's up - the road to Exascale.

Klaus Gottschalk: For supercomputing data are becoming more and more important. We have a new service offering for that, called scale-out NAS file service.

Primeur magazine: And that is using GPFS (Global Parallel File System)?

Klaus Gottschalk: It uses GPFS internally for the disk access and to provide a file system. On top of that there are specialized access nodes which provide the different protocols. So it is a single namespace and you can do a scale-out. So we are having two, three, four systems sitting next to each other and the file space is across all of them. That is the difference to other solutions in the market. If you have an ordinary NAS system and you need a second one, you have two separate file spaces.

GPFS is very well proven. It is a stable filesystem that provides very good capacity and throughput characteristics. For the HPC world it is a proven solution. Most of our supercomputer systems actually run with GPFS as filesystem. We are now trying to bring GPFS in this new appliance to a larger market. So we provide file services not only for HPC, but also for HTC. It will be a solution also for multimedia; a solution for any company with large file sharing.

Primeur magazine: But GPFS is still IBM only?

Klaus Gottschalk: It is owned by IBM, but we support different hardware vendors. So GPFS these days now is supported on any IX86 system. It is a product of IBM. It is not open source, but we have for academic sites an open source license to develop some new extensions or port it to a system which we do not support. They can get the source code from us for free.

Primeur magazine: Is it also used in research projects?

Klaus Gottschalk: One of our installations in Europe is the DEISA consortium. The group of large HPC sites all over Europe. There are some none IBM sites in DEISA. The IBM sources licenses for research is one of the ways to include systems which are non-IBM in the DEISA Consortium and sharing the files over GPFS. We did the port for the SGI system in Munich some time ago and NEC has a source code license to give access to the file systems. But I have not yet heard of a real port.

Primeur magazine: Will it also be used in PRACE?

Klaus Gottschalk: As a product, yes. But I do not think they are doing research. There are some HPC calls going on. I know that GPFS is part of some of it. There is a consortium of TU Dresden, HLRS and others that includes GPFS. So we are part of that collaboration. But the project proposal is not approved yet. The deadline to submit it was yesterday.

GPFS is a hot topic since Oracle announced some license changes to Lustre.

This year we are bound to deliver the large system in the US and in our view a lot of the scaling works already for the Exascale systems. So we need to deliver a Tbyte per second of throughput. Which is a lot. Somebody told me that the actually used bandwidth of the Internet today is 1 Tbyte per second. Internet provides a lot more, but what is actually used is 1 Tbyte/s. We are bound to deliver that on one system. The system will also have thousands of nodes; so far the limit for such an installation of about 3000 nodes in our largest customer installation. It will scale, and we will deliver about 200.000 nodes. So scaling is key, and that is our plan for the next release. We are bringing GPFS to a larger scale than we actually had. We need to demonstrate it this year.

Primeur magazine: Do you also have plans to participate in the Exascale software research projects?

Klaus Gottschalk: Yes, we are participating in those too. A lot in these projects is about productivity. So we are providing tools for that. In the past years it happened to be that you had more and more huge machines with high peak performance. But even on the largest installations you do not see more than 10% of the peak performance. Getting it beyond this, is a huge amount of work. So productivity in these areas is key. So we have hints directly from intelligent datamining systems giving you the points where reordering or some tweaks here and there will give you a better scalability.

Primeur magazine: But on the other hand, of course it is nice to have a high efficiency. But in the end people look more at the real performance, the real costs of running an application, and the power cost of an application. Whether you use a 100% of all available peak performance, or less, people do not really care.

Klaus Gottschalk: Yes, as long as the scalability continues so you are still getting more results when you do more processors. Scalability is more important than efficiency. And indeed power efficiency is a hot topic these days.

Here in Germany we have a lot of sites that will exceed 1 mW this year. That is a million Euro per year in energy cost. At least in Germany. We are having high energy costs: Everything you can reduce here is real money saved. These national supercomputing sites, in their procurements all talk about energy efficiency. LRZ in Munich, for instance, they tried to become the most efficient supercomputing site in Europe. An efficiency factor of 1,1 so 10% overhead for cooling, etc., that is not easy to reach.

Primeur magazine: A general question: why did IBM leave the top positions in the TOP500 to someone else?

Klaus Gottschalk: It is always a question of projects. You will see the next step coming next year. Because it is not for the TOP1 position per se that we are following. It is more about a project that leads to a Top position. You always need to have a project that allows you to do that. But from the planning, you can see we will come back. Blue Waters is a good candidate for that.

Primeur magazine: You are not afraid the Chinese will overtake?

Klaus Gottschalk: You never know, but you have to have profound capabilities to do more than just a single big bang. So continuity is a much, much profound effort. You have to deliver. Given the records we have in history, there is much more reliability in things that need to be done. Compare this with the Earth Simulator: This was a one-of-a-kind system. But there was no possibility for them to do that right after again. It took them years again, it still takes years, for the next one.

It is a business, and a business has to earn money. If it works out for the number one position, we are happy to do so, but it is not a goal in itself.

Primeur magazine: The Blue Waters supercomputer is of course the next big system. Can you tell a bit about that?

Klaus Gottschalk: Well there is public knowledge that NCSA has published themselves. So you will basically find all the architecture or things in there. It is a Power 7 based system, water cooled, no accelerators, running Linux.

Hans Rehm: If you look at IBM's engagement in the HPC market you can see that we are not betting on one single thing, But we have a power platform, we have hybrid solutions, we have accelerator things. We have the x86 architecture. There is a lot of variety we sue for specific areas. You might see other things showing up that we do not know yet. The principle is not betting on one horse.

Primeur magazine: Can you tell a bit more about the accelerator?

Hans Rhem: Well, the NVIDIA thing was just announced based on what we saw was market demand for these solutions. At the moment it is an accelerator solution that fits in the High DataPlex.

Klaus Gottschalk: We can deliver it. It is a proven solution, but it is not our main strategy. Accelerators are a way to bridge some of the gaps. But in total we think high-performance computing will be based on pure system power, not on accelerators. It is still the bottleneck to get all the data from the memory to these cards. The technology we are looking at, is, if there is some kind of specialized cores, like we did in Cell then they should be in the same memory context. So no moving of data around. Direct access to memory. That is probably something we will see in the next generation of Power chips.

Following to what we are doing with Power 7 now. But you see we are open. When a customer demands it, we provide it and the environment to run these applications. Offering solutions so that they can port their applications which currently run accelerators onto future technology. Whether it is OpenMP, it's OpenCL or it is some kind of MPI, we will support these pillars of applications on our hardware, and we think that like in the past, accelerators do appear in the HPC market, and will stay there for some time. But we think that, as in the past, the major problems are still around. Success depends on the moving of data and very complicated programming paradigms.

Moving from one generation of accelerators to the next one is still difficult. So our strategy is being prepared, have everything ready. With the Roadrunner we built a very similar architecture and in case everything moves ahead we are prepared to take these applications on our system. Blue Waters is of course the next one you will see, but we are also working on the next generation of BlueGene, which will be BlueGene-Q after the BlueGene-P now in Juelich, and a lot of compute performance will come out of this.

It will have a large SMP node, currently we have four cores for CPUs in a node and doing two threads on each of them. In the future it will be 17 cores. One for the operating system, and 16 for the real work, and 4 threads per core. Also some of the interconnect will be updated. So far we always had a 3-D torus in the BlueGene. We are currently evaluating larger dimensions. Whether it is a 4-D torus, or a 5-D-Torus we'll see, and most likely there will be a system in Juelich. So watch out for 2011. There will be one on the TOP500 list.

Primeur magazine: Thanks for sharing your thoughts with us.

Ad Emmen