24 Jun 2011 Hamburg - During the past year since ISC10, Cray has introduced a new system, the XE6, announced sales to HLRS and CSCS in Europe, and continued a strong presence in the upper regions of the TOP500 with three machines in the TOP500. At ISC11 we talked to Crays Barry Bolding about the developments and the road to Exascale.
Primeur magazine: So you want to tell something about the new system?
Barry Bolding: Last year when we talked here at ISC in Hamburg we introduced the Cray XE6. One thing I would like to say is that this product introduction has gone very well. We delivered and installed many systems last year and now we are growing our customer base. The Cray XE6 is a very popular product. It’s very stable and very reliable. In fact, our reliability figures have gone up significantly. We are hovering typically in the 99.5 % of system reliability metrics, and now we are selling into the manufacturing space where you need the high reliability and you need the high performance. We just announced a system sale to General Electric, which is a good example of how Cray is getting back into manufacturing just like it used to be in the very early days.
Primeur magazine: And do you then need 99,5 or do you need some nines more?
Barry Bolding: No, we do not do five nines of reliability. We are still geared towards performance, and you really give up a lot of performance to get those five nines of reliability. That is a different marketplace. But we are very happy and I believe that most of our customers are happy as well. We had one machine actually that went down recently that had been up in production for a hundred days. We just announced the Cray XK6, a variant of the Cray XE6. It uses the same back plane, the same cabinet and the same Gemini interconnect that we released last year, but we have created a variant of the compute blade and the compute node that uses the future “Interlagos” AMD processor and the NVIDIA X2090 that runs at 665Gflop/s. The AMD Opteron “Interlagos” processor combined with the NVIDIA GPU, creates a very powerful compute node. Cray will have the same Cray Linux Environment (CLE) operating system on the two types of nodes. You can also blend the nodes in a single system.
Primeur magazine: So you have one with GPU and another one without?
Barry Bolding: We announced the Cray XE6 and Cray XK6 as two separate products. But we can actually combine them on the same network. It can be a blended system. We announced it that way because we are in the business of scalability, not in the business of typically selling individual GPUs for testing. Someone else can sell individual GPUs and they can put them into laptops or PCs and test applications. That is fine for some vendors, but that is not what we are targeting our systems for. We are building the Cray XK6 to be a scalable production machine. So we want to give the Cray XK6 systems its own name, its own “brand”. We did not just want this to be a blade option within the Cray XE6. That is really a statement that we want to build scalable systems with this. But you can still mix and match to create blended Cray XE/XK systems. It will be shipping in the third quarter of 2011. The software stack as I said is the same software stack as on the Cray XE and XK systems. There are a few additions that come with the Cray XK6. You can upgrade the PGI compiler to the PGI accelerated compiler. It comes with the CUDA software development kit and the Cray compiler is beginning to implement features that allow our customers to utilize the GPUs.
So you have multiple options. It is really important, because GPUs are in their infancy. We feel you need to have a multitude of options, to compile and run the codes, because even a slight deviation in coding can potentially cause very large performance changes on a GPU system. Users need to have multiple options. In the long run, we really do feel that the directives-based approaches are going to be very important as we drive towards the middle or the end of the decade. The directives-based approaches like OpenMP/MPI, or others that may develop, are important because they allow a single software development project to target multiple types of accelerators or even non-accelerator systems. At Cray we really think that this type of programming model is going to be important to efficiently make use of GPUs and many-core processors.
Primeur magazine: It is not like that the accelerators will somehow converge into the CPU?
Barry Bolding: I believe that GPUs will incorporate features from the CPU and CPUs will incorporate features from the GPUs. Sometime in the middle of the decade, or slightly towards the second half of the decade, we expect these functions to begin to merge onto single processors. The major processor vendors are talking about this merging of technologies and about the ways to do this.
Primeur magazine: Yes, NVIDIA is also talking about incorporating the CPU into the GPU.
Barry Bolding: Absolutely, in the press they have talked about working with ARM. AMD has ways to do this with their Fusion model. In fact they have done it in the client space already and the question will be, when will they do it in the server space? Then of course, Intel’s road map is interesting as well. But right now, we are concentrating on the NVIDIA GPU in the Cray XK6. NVIDIA has a good software stack and it has good performance. These are very important elements of making a powerful, scalable system.
Primeur magazine: You mentioned there are good sales of the Cray XE6 product?
Barry Bolding: The Cray XE6 has had very good sales – we have shipped over 300 cabinets of Cray XE6. I believe we now have more than 5 Petaflop/s of Cray XE6 systems installed at various customer sites. We have multiple sites that are over a Petaflop/s. NERSC is over a Petaflop/s. Sandia/LANL is over a Petflop/s and we also have several Cray XT5 and Cray XT6 sites that are also over a Petaflop/s. We also have a number of mid-sized systems in the field.
Primeur magazine: What is the biggest in Europe currently?
Barry Bolding: It will be HLRS at the University of Stuttgart, but I believe it is the Cray XE6 “Hector” system at the University of Edinburgh (EPSRC) right now. They are going to upgrade their system as well. Later this year, they will move to AMD “Interlagos” processors.
In Europe we have grown quite a bit over the past 12 months. we have installed or upgraded systems at the Royal Institute of Technology in Sweden -- KTH has a very large system and they are using it for a number of interesting scientific applications. In fact in their newsletter they just published a series of applications that are really cutting-edge in the life sciences area. We upgraded Hector, as I said earlier. We have shipped several types of systems to CSCS in Switzerland and also have orders for additional systems. CSCS was announced as the first Cray XK6 customer, and they have also ordered a follow-on to our Cray XMT system, and are upgrading the Cray XT5 “Rosa” system. Also, we announced that HLRS is getting a Cray XE6 system later this year. In Europe we have a lot of momentum, and there are a few more countries we would like to sell systems in, but there are Cray systems today in most of the larger countries in Europe.
Primeur magazine: The sites you mentioned are research. Do you also have manufacturing sites?
Barry Bolding: Mostly government, research and academics in Europe right now. So the manufacturing segment is a new target for us and just beginning to be an area of growth. We have a Cray XT system at a major pharmaceutical company and Cray XE6 at General Electric as we just announced. We also have smaller CX systems at Horton-Wison and Swift, but this is just the beginning. Our products are once again a good fit for this segment and now we need to go out and sell some systems.
Primeur magazine: And the question you have to ask here: the road to Exascale?
Barry Bolding: Exascale or ExaFlop/s as a "number" is just like Linpack to us - it does not really mean much as a peak number. At Cray we are really focused at production scalability. Jaguar has, for two and a half years, been installed and solving ground-breaking science problems. It doesn’t matter if it is number one, number two or number three for a single Linpack benchmark. What really matters is whether great science has been done on that machine over the last two-and-a-half years. I would argue that more good computation science has been done on Jaguar, than almost any machine that has ever been built. That is my opinion.
Primeur magazine: How can you prove that?
Barry Bolding: You would have to look at the number of publications; you would have to look at the number of production applications at scale. Jaguar has five applications running at over a sustained Petaflop/s. So problems that cannot be solved anywhere else in the world are attempted on the Cray XT5 at ORNL. In addition, there is ground-breaking work done by a number of commercial companies on Jaguar. At ORNL they have the INCITE programme, which allocates commercial companies time on to their systems. In fact, General Electric has run on the Jaguar system in the past through the INCITE programme. BMI Corporation ran simulations to optimize fuel efficiency on trucks. There is an article about that work in a recent issue of the Economist and they are quoted as believing that this work helped bring their product to market two years earlier than expected.
When the Cray XT5 was first installed at Oak Ridge, in the first month that it was installed, we had set multiple records on real production applications. To me that is productive supercomputing. That is what we aim for at Cray.
With regards to Exascale Cray will aim to bring to market production Exascale systems. For us, it is not being number one in the world. It is about providing the HPC community the best, most productive, Exascale system that you can make.
Primeur magazine: I am thinking about how you can show the real applications that really make a difference. Some metric would be useful.
Barry Bolding: NERSC actually tracks scientific papers published based upon work done on their systems. This is one metric that might be useful for measuring productivity. The Jaguar system at ORNL, has been in production for 2 and a half years with a system that is greater than 2 Petaflop/s and 220 thousand processing cores. You can do a lot of productive work in that time on that size of system and I am sure there are many ways to measure that productivity. The K Computer by Fujitsu is actually a very nice design, it is a great engineering accomplishment, but what really will determine how useful it will be for the HPC community, will be how much productive work gets done on that system over the next few years. How much productive work will also depend on how many applications can scale. Will it support a few applications running efficiently, or will it have hundreds of applications running efficiently?
So I am very proud of Cray’s accomplishments over the last 12 - 18 months. Gemini is a great network, AMD processors and now the NVIDIA processors are best in class, and the Cray XE6 and XK6 systems based these technologies are built for scalability and productivity. This puts Cray in a great position to provide our customers with the products they need to solve their most challenging HPC problems.
Primeur magazine: Thank you for this interview.