Convey on track

4 Jun 2010 Hamburg - At ISC’10 in Hamburg we had a discussion with Steve Wallach and Bruce Toal from Convey computers. Last year Convey made their appearance in Europe. How are you doing after one year?

Bruce Toal:So last year we were here with two beta customers. This year we are here with over 21 customers. We moved into production in October last year. We closed our third round of financing. So last July we raised another 25 Million USD. And what we also have been doing is continuing to provide more solutions to more market segments. You have seen our announcements around the bioinformatics recently. We expect that at future shows we can talk about different industry segments. We did also optimize our personnel: Added more sales persons here in Germany and Switzerland, and we are looking into Asia. So we are doing more sales, growing customers base, and we will begin working on our next generation product as well. So everything is trending up very nicely.

Primeur magazine:The number of customers, is that in line with what you expected?

Bruce Toal:Yes, it is.

Primeur magazine:Can you tell a bit more about some of the important ones?

Bruce Toal:The ones I can mention, I will. At this show we announced the Virginia Bioinformatics institute; it is part of Virginia Tech University and the University of South-Carolina. These both are recent customers. The University of Illinois, Stanford University, Lawrence Berkeley Labs, University of California. And in Europe: the Karlsruhe Institute of Technology, and Imperial College of London.

Primeur magazine:Are they only used for testing and exploring or also already to run production?

Steve Wallach:Some are beginning to go into production.

Primeur magazine:And if they do, are they then ordering more?

Steve Wallach:Yes, they do. Because one of our benefits is how we scale, how we do the clustering. The clustering and the networking is done on the IX86. We do not need to do anything special. It is not as in the time of Convex when we had to build our own chassis. The applications scale well: it is basically a clustering approach with MPI. And that is pretty standard.

Primeur magazine:So it worked out as you planned.

Steve Wallach:Yes, but it is not that it was easy.

Bruce Toal:It is hard work, but it is all going to plan.

Wallach: We are a computer company. And from day one we put in the infrastructure, the people, to be a computer company. What we produce is not something like an add-on: “Can I put this in the PCI slot of my computer?” If you do that, you are not a computer company. People buy solutions from us and not a piece of hardware. We are beginning to get traction. People are saying: “These guys know what they are doing, and they are doing it all over again.” In Convey, it are the same people from Convex. For me the nicest aspect of the company to this point is a combination of the customers that were previous customers of Convex, that say: “I like working with you people, I know I can trust you. You know what you are doing.” Which helps a lot. They believe our road map, our strategy. "We have seen you do this before, 15 years ago”. There are a lot of people that say similar things like Convey, but they do not have creditability. Everybody can present a road map but your team must be able to execute it.

The reason why this is all so important, is because we are focused on building a single node. We are really building the best prices/performance application engine that is rack-mounted. That is where our focus is.

Everywhere here at the conference you will hear about Exaflop/s and Exascale. When we had Tera it was about how are we going to Peta, and now that we have Peta, it is about how are we going to Exa. I keep using the term, “it is deja vu all over again” for this. But it really is true. However, to go from Peta to Exa will not be as easy as it was to go from Tera to Peta.

Primeur magazine:If you listen here at the conference, it will only take ten years Jack Dongarra and others told in their presentations. They can do it, they say.

Steve Wallach:Ten years, and maybe five billion dollars investments, and a programming model that noone knows how to make it work.

Primeur magazine:They can make it work. They cannot guarantee, but they are pretty sure, they said.

Steve Wallach:I have been to these Exascale workshops too. But it are the guys like me that have to make it work. But I do not want to get into too much detail.

Primeur magazine:What they say is that you will have a computer with a billion threads and a billion cores. And we just have to make the programming models to work for that.

Steve Wallach:So let us just take that, because that statement in itself is correct. I have been at the same workshop. When that slide was put up in the workshop in December in 2009 in San Diego. The compiler people said: I have no idea how to make this compile to a billion threads. Literally they said: I am clueless . These are some of the smartest people I know in the industry. And there is a report coming out. From the workshop. And I have preliminary version and I am quoted saying: that is the wrong way to do it. Instead go building a billion lightweight threads, let us build some application specific processors. And my applications if they can be 100 times faster than a thread, rather than a billion I may only need a million, or whatever. And when I need a million, we can even understand today how to do that. And the other aspect is it will be more reliable. When they did the analysis of a billion threads, the MTBF was something like one hour. You cannot have a machine that fails in one hour.

Primeur magazine:So they say you have to build that into the system software. That it can be fault tolerant at that level.

Steve Wallach:Right. But what I am trying to share with you, when you say "how do you do that?” Well we have ten years to figure it out.

Primeur magazine:So that is true. I also asked that question, because I do not see how they can do that. Perhaps they can.

Steve Wallach:The whole point about this is that companies like Convey are to going to sell this. But there seems to be this notion where people say: “We can make a tread ten or hundred times faster than it is, and then you need ten million treads instead of a billion, that is simple mathematics.” And if they can do that, it is also more reliable because MTBF is inversely proportional to the number of transistors. Software, if I have one hundred less processes it is simpler, and it also probably uses less power. The studies that show for an Exascale could potentially be anything from 10 MWatt to 30 MWatt. The numbers vary. But if you can save power from say 30 down to 20 MWatt, that can mean the difference between being able to provide enough power or not. And electricity is becoming more and more expensive, not less. So I believe in this. Noone has ever gotten anywhere near a billion treads. You can say BlueGene has something like a million. So that is three orders of magnitude off. It is not necessarily that simple. If enough smart people work on it, they probably will find a solution. But I am not sure that what they are talking about today, will be the solution in ten years. It is difficult enough to predict what is going to happen in five years, let alone ten years.

Primeur magazine:Anyway. The simpler the machine the more chance you have to actually build it.

Steve Wallach:And that is why I stood up and said: application specific. I need one hundred times less of everything. I did not say how I am going to do it. But it is mathematics. And what I believe, and now I am talking Convey, since we take a very application specific approach to the solver: solving speech, or seismic, or whatever, that this approach we are getting to see, uses less power, could potentially be smaller. There are still some issues. I am not saying we solved them all, but it is showing for a certain class of applications that it makes sense. Of course, someone could say: well if you build an applications specific thing, you will need a hundred different designs. However, it turns out that at UC Berkeley they developed this notion of a "Motif". They say: there are really only 13 architectural approachesfor major applications. For instance one would be: solving dense matrices. Another is solving sparse matrices. So it is at that level. It is not the name of an application. It is the framework for a class of applications.

Primeur magazine:So it is a kind of template you could say?

Steve Wallach:At Convey people are saying: How many different "templates" do you think there are? This was before the Berkeley stuff. And we were actually talking amongst ourselves. I said: maybe ten. Ten and thirteen is close enough. For me it is. The way we came up with the number was not really a guess. At Convex I used to sue the term: nifty fifty. That said, even if you have a general purpose machine, fifty applications were 90-95 % of the cycles. Just like you have in engineering the 20 - 80 rule or the 10 - 90 rule. So if you said structures, it would say: NASTRAN, ANSYS, ABAQUS. If you said seismic, it were FFT's, convolutions, and since every seismic company had a contract with Convex, I can tell you, we had one library they effectively used.

So of the fifty, if you sat down, you could say well, it is perhaps ten unique ways.: dense matrices for this, sparse matrices for that, etc. So it is not like you build a NASTRAN machine. You build a finite element machine. That is what people do not understand. And if you do that you can capture the majority of applications. And if you can get, as we see with bioinformatics, two orders of magnitude more performance, now that is significant! And it is not just throwing gates at floating point operations. You are intelligently configuring a computer to solve an application.

What we are doing with Convey is not just FPGAs. We are using FPGAs as a vehicle. We also have a high speed memory system. If we did not have that fast a memory system we would not be sitting here. And because of that the way that FPGAs are mostly used, we have to explain to people that this is not a PCI card, or a PC plugin. You do not have to program the FPGAs. The most common question is perhaps: who's going to do the Verilog programming. So we will see. And I think the one reason we got a lot more respect, is we do have a team of these guys that are awkward. We also have industrial connections with Xilinx and Intel. If we did not have our relationships with those two we also would not be sitting here. Whether it is licensing, or early access to FPGAs.

Data intensive computing

  • --------------------------------

I think one of the most interesting things is actually one of the biggest growing areas in HPC which is called data intensive computing. It is no floating point in general. We are beginning to get some traction here. That we may be having one of the best ways of doing data intensive computing, because our memory is good and the fact that certain things that we have not disclosed yet, make that model of computing go very well.

Bruce Toal:I just go back to the discussion we had yesterday with a customer, that was telling us, in his experience, with his code, - weather codes - on a vector machine they got maybe 20% of peak, on a Xeon they get between 3 and 5 % of peak performance, and we said: we think we will get 50% of peak performance. That is just another illustration of how with our memory system we reach a significantly higher perspective of percentage of peak performance.

Steve Wallach:Here is another interesting thing: some of our biggest users are people that have Fortran. Most of the new technology, they have is C and C++. I am not saying that that is the wrong decision, but I will tell you there are certain customers, that if we did not have Fortran, would not look at us. Now they say: “I have not seen you in ten years, but I am glad to see you, I have a million lines of Fortran, and if you would tell me to convert that into C, come on Steve, you know me. Take this code, which took us twenty years to get working, and you want me to convert it to a new language? It ain’t gonna happen.” And a lot of people in that exhibition room over there, they do not have Fortran compilers, because, and do not get me wrong, new stuff is generally written in C. But a lot of stuff, legacy, is written in Fortran, and if we did not have a Fortran compiler, I do not know with how many customers, we would not even have had a discussion.

Bruce Toal:We had one very young user, a potential user. He had these questions like, ‘do you have a Java compiler?’, he reminded me back at the time when people used to ask whether we had a Cobol compiler. We said we think you should spend most of your time converting your application from Java to C, and then you can talk to us later. But so many students are just learning C and C++ or Java. And when it is just Java, we are in trouble. Because that is not a very efficient language.

Steve Wallach:In my talk that I gave on Monday, it was actually three quarters nothing to do with Convey. One of the things I said, was that when people come out of school, they definitely do not know Fortran, they know Java, Javascript, Ruby, and the reason for that is that companies like Microsoft and Google, are hiring these people. So if you are in school today, and a graduate, it is more likely you are going to work for Microsoft, Apple, Google, and then you need the Java and the Javascript. I think that is going to hurt HPC, because these languages are not very useful for HPC: you are not going to do Navier-Stokes in Java.

Primeur magazine:That is difficult. But how can you get people to do that? Because it is not cool to do something where numbers come in and numbers go out. And with Javascript and Python, you make flashy websites.

Steve Wallach:But as Thomas Sterling said: the golden age of HPC was late eighties, early nineties. And it is not happening again. Even if they say the Google data centres are big. They are not worried about bi-section bandwidth on the network, because it is solely transaction processing.

Primeur magazine:But even in the golden age of HPC, HPC was not that big.

Steve Wallach:You are right, but it had a certain visibility, in the United States, Europe and Japan, and it may be in the news, people were doing crash on cars. Now what is interesting, is with the Chinese now doing things with Linpack, will that cause a reaction? I am posing this as a question. Will for example the US react to that? The same way they reacted to the Japanese Earth Simulator? My own opinion is, my advice would be: it is just another point in time. Let us not worry about it, let’s get on to Exascale computing. But that is not my decision.

Primeur magazine:But still I do not understand your point about HPC decline. This conference is growing like mad.

Steve Wallach:Correct.

Primeur magazine:Europe is putting hundreds of millions of Euro in supercomputing, the top segment in the supercomputers is growing according to IDC. It has grown 25% over the last year, so what is the problem with HPC? It is doing well, isn’t it?

Steve Wallach:Maybe you are right.

Primeur magazine:So maybe it is already the dawn of a new era of HPC.

Steve Wallach:One of the things is there are less vendors.

Bruce Toal:The problem is that you are getting so commoditized. You can have that type of growth, and maybe that is good, but it is growth without the expertise.

Steve Wallach:I think the difference was in the late eighties, early nineties, companies could begin and design their own processor. Convex started that way.

Primeur magazine:But today you can begin and design your own data centre. So isn’t that progress?

Steve Wallach:From the TOP500 about 480 use Intel processors. So if 80% use Intel microprocessors, then you can only figure out what is the interconnect to make a difference. It is an interconnect architecture these days. It is not a computer architecture. I am not saying that is bad, but when we were in 1990 it was MPP this and MPP that, it was an architectural discussion. It was also interconnect, but it was balanced. This is a vector machine with 4 pipes, versus 8 pipes, etc. Now, just to use it as an example, I take an Intel board. Do I use a GPU or not in the PCI that I buy from someone, not design my own, and am I to use Ethernet, Infiniband, or proprietary and I am going to take this one page of Fortran and spend as much time as I can, to make it run fast. I am not oversimplifying, but that is pretty much what it is. You cannot change the floating point unit in an Intel microprocessor, even if you wanted to. You cannot change the memory architecture. So I think that is where a lot of people, especially when we go to Exascale, if you want a billion threads, see problems. So as I said you are not going to put a million threads on this. The only consensus is you need an optical interconnect, because of the distance and bandwidth, optics is the only way to do it, but we will see.

Bruce Toal:Maybe by next year, we will have made more progress in Exascale, the industry. But we will continue to make systems with the fastest thread for an application. We will continue to adapt the hardware to the applications.

Primeur magazine:In these bigger machines, they do not only look at the speed, but also at the power consumption. Is there a way that you can monitor that?

Bruce Toal:We do not get questions like that. We try to go as fast as possible. And we use efficient FPGAs.

Steve Wallach:The way we handle this is by having our unit air-cooled. That is one reflection of power design. It is a new design. We asked people how much power they can have in a rack for example, and we want to make sure that we have the maximum density of air-cooled chassis.

Software

Steve Wallach:We have the solver software that is as important as hardware. So what percentage is hardware and what percentage is software. It is now nearly fifty/fifty. So our compilers are very good. I tell you one thing: other companies when they need compilers they look at a compiler builder.

There are always compiler bugs, We are not having that debate with our customers. Or if they want a feature that is not in there, if we consider it important, we call the team in and schedule it. If it is a third party compiler, it would be a three way negotiation. And I can tell you from Convex. We talked to a lot of the same customers, when we would compete in the eighties against these various people, one selling point why they chose us, was the compiler support. I give an example. We had MSC-Nastran on the system. Multi-million lines with every variation of dialect of Fortran. We compiled it and there was some number of bugs. And they all had to do with the syntax of the language, because they had something like Fortran 66 constructs in Fortran 77. And the 77 standard says you do not have to run deprecated features. Nastran said: we understand, but do you want to run our code? The standard is our standard, not the Fortran standard. So there we go. We showed them: do you see the things that did not compile? Now there is a switch compile-msc-nastran. It was not too difficult to do.

The reason I bring that up is we had some discussions about the type of applications and when the people heard it was the Convex team, their reaction was: the Convex compiler was the best compiler we ever used. Literally this happened a month ago and they remember that: OK. You are right, in a week we will take it and it just blew their mind that also the optimizations did work. Honestly, I believe, that most people do not understand. And this is now my thing. If you go to the IBM booth, they have got an 8 Tflop/s thing that weighs 110 kilos with pipes and cooling. Who knows how much that costs without the chips. It is very sophisticated.

Now take our compiler. That has more sophisticated technology than ever, How do you show that? That is one of the problems. IBM can showcase this 110 kilo thing of water-killed pipes going in and out. And if I were IBM and had it, I would showcase it too. But how can we showcase our compiler technology?

Primeur magazine:It is an exhibition, so if you have something to show you do it.

Steve Wallach:But our compiler from a technology perspective, is more difficult to do. More innovative but unfortunately, you have to show people, by compiling code, but I cannot put it on the platter here: look how neat that is.

Bruce Toal:We all are a kind of material beings: we like to touch things.

Primeur magazine:You could hire some of those Python, Javascript guys that can make some nice flashy animation of it.

Steve Wallach:The reason that Google is so successful is that they figured out how to write distributed software for query processing. And then of course, managed the business, and whatever. But the way to successfully demonstrate the software is to successfully build the company.

Primeur magazine:Thanks for sharing your thoughts with us.

Ad Emmen